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Abstract 

We present a universal algorithm for online trading in Stock Market which performs asymp- 
totically at least as good as any stationary trading strategy that computes the investment 
at each step using a fixed function of the side information that belongs to a given RKHS 
(Reproducing Kernel Hilbert Space). Using a universal kernel, we extend this result for 
any continuous stationary strategy. In this learning process, a trader rationally chooses his 
gambles using predictions made by a randomized well-calibrated algorithm. Our strategy 
is based on Dawid's notion of calibration with more general checking rules and on some 
modification of Kakade and Foster's randomized rounding algorithm for computing the 
well-calibrated forecasts. We combine the method of randomized calibration with Vovk's 
method of defensive forecasting in RKHS. Unlike the statistical theory, no stochastic as- 
sumptions are made about the stock prices. Our empirical results on historical markets 
provide strong evidence that this type of technical trading can "beat the market" if trans- 
action costs are ignored. 

Keywords: asymptotic calibration, defensive forecasting, online trading, reproducing 
kernel Hilbert space, universal kernel, universal trading strategy, stationary trading strategy 
with a side information 



1. Introduction 



Predicting sequences is the key problem for machine learning, computational finance and 
statistics. These predictions can serve as a base for developing the efficient methods for 
playing financial games in Stock Market. 

The learning process proceeds as follows: observing a finite-state sequence given online, 
a forecaster assigns a subjective estimate to future states. 

A minimal requirement for testing any prediction algorithm is that it should be cal- 
ibrated (cf. Dawid 1982). Dawid gave an informal explanation of calibration for binary 
outcomes. Let a sequence ui,u)2,. •• , Wn-i of binary outcomes be observed by a forecaster 
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whose task is to give a probability p n of a future event u n = 1. In a typical example, p n 
is interpreted as a probability that it will rain. Forecaster is said to be well-calibrated if 
it rains as often as he leads us to expect. It should rain about 80% of the days for which 
p n = 0.8, and so on. 

A more precise definition is as follows. Let I(p) denote the characteristic function of a 
subinterval I C [0, 1], i.e., I{p) = 1 if p € / and I(p) = 0, otherwise. An infinite sequence of 
forecasts pi,P2, ■ ■ ■ is calibrated for an infinite binary sequence of outcomes ... if for 
characteristic function I(p) of any subinterval of [0, 1] the calibration error tends to zero, 
i.e., 



1 " 



i=i 

as n — > oo. The indicator function I{pi) determines some "checking rule" that selects indices 
i, where we compute the deviation between forecasts pi and outcomes Wj. 



If the weather acts adversatively, then, as shown by Oakes (1985) and Dawid (1985), 



any deterministic forecasting algorithm will not always be calibrated. 



Foster and Vohra (1998) show that calibration is almost surely guaranteed with a ran- 



domizing forecasting rule, i.e., where the forecasts pi are chosen using internal randomization 
and the forecasts are hidden from the weather until the weather makes its decision whether 
to rain or not. 



The origin of the calibration algorithms is the Blackwell (1956) approachability theorem 



but, as its drawback, the forecaster has to use linear programming to compute the forecasts. 
We modify and generalize a more computationally efficient method from Kakade and 



Foster (2004), where "an almost deterministic" randomized rounding universal forecasting 



algorithm is presented. For any sequence of outcomes ui,U2, ■ ■ • and for any precision of 
rounding A > 0, an observer can simply randomly round the deterministic forecast pi up 
to A to a random forecast pi in order to calibrate for this sequence with probability one: 



lim sup 

n— >oo 



1 - 



i=l 



< A, 



(1) 



where I{p) is the characteristic function of any subinterval of [0, 1]. This algorithm can be 
easily generalized such that the calibration error tends to zero as n — > oo. 

Kakade and Foster and others considered a finite outcome space and a probability dis- 
tribution as the forecast. In this paper, the outcomes oji are real numbers from unit interval 
[0, 1] and the forecast pi is a single real number (which can be an output of a random vari- 
able). This setting is closely related to Vovk ( 2005a[ ) defensive forecasting approach (see 
below). 

In this case real valued predictions pi £ [0, 1] could be interpreted as mean values of 
future outcomes under some unknown to us probability distributions in [0, 1]. We do not 
know precise form of such distributions - we should predict only future means. 

The well known applications of the method of calibration belong to different fields of the 
game theory and machine learning. Kakade and Foster proved that empirical frequencies of 
play in any normal-form game with finite strategy sets converges to a set of correlated equi- 
librium if each player chooses his gamble as the best response to the well calibrated forecasts 



2 



Vovk et al. 


(2005 


), 


Vovk 


(2005a 


), 


Vovk 



(2006), Vovk (2006a), Vovk (2007), Vovk developed the method of calibration for the case 
of more general RKHS and Banach spaces. Vovk called his method defensive forecasting 
(DF). He also applied his method for recovering unknown functional dependencies presented 
by arbitrary functions from RKHS and Banach spaces. Chernov et al. (2010) show that 
well-calibrated forecasts can be used to compute predictions for the Vovk ( 1997) aggregating 
algorithm. In defensive forecasting, continuous loss (gain) functions are considered. 

In this paper we present a new application of the method of calibration. We construct 
"a universal" strategy for online trading in Stock Market which performs asymptotically at 
least as good as any not "too complex" trading strategy D. Technically, we are interested 
in the case where the trading strategy D is assumed to belong to a large reproducing 
kernel Hilbert space (to be defined shortly) and the complexity of D is measured by its 
norm. Using a universal kernel, we extend this result to any continuous stationary trading 
strategy. Our universal trading strategy is represented by a discontinuous function though 
it uses a randomization. 

The problem of construction the universal strategies for online trading in Stock Market 
is popular in Machine Learning. The worst case study of universal trading was introduced 
by Cover (1991). Unlike many authors, we consider the simplest case: online trading with 
only stock. These results can be generalized for the case of several stocks and for dynamical 
portfolio hedging in sense of framework proposed by Cover and Ordentlich (1996). 

We consider a game with players: Stock Market and Trader. At the beginning of each 
round i Trader is shown an object Xj which contains a side information. Past prices of the 
stock Si, ... , Si^i are also given for Trader (they can be considered as a part of the side 
information). Using this information, Trader announces a number Mj of shares of the stock 
he wants to purchase by Sj_i each. At the end of the round i Stock Market announces the 
price Si of the stock, and Trader receives his gain or suffers loss Mj(Sj — Si-i) for round i. 

n 

The total gain or loss for the first n rounds is equal to Yl Mi(S{ — Sj-i). 

i=l 

We show that, using the well-calibrated forecasts, it is possible to construct a universal 
strategy for online trading in the Stock Market which performs asymptotically at least as 
good as any stationary trading strategy presented by a continuous function D from the 
object Xj. This universal trading strategy is of decision type: we buy or sell only one share 
of the stock at each round. The learning process is the most traditional one. At each step, 
Trader makes a randomized prediction pi of a future price S{ of the stock and takes the best 
response to this prediction. He chooses a strategy: to dealing for a rise: Mj = 1 if pi > 
or to dealing for a fall: Mj = — 1 otherwise, where Sj_i is the randomized past price of the 
stock. Trader uses some randomized algorithm for computing the well-calibrated forecasts 
Pi- 

Therefore, our universal strategy uses some internal randomization. Unlike the statisti- 
cal theory, no stochastic assumptions are made about the evolution of stock prices. 

Our main result, Theorems [2] and [5] (Section 121) , and Theorem [7] (Section [5]), says that 
this trading strategy Mj performs asymptotically at least as good as any stationary trading 
strategy presented by a continuous function D(x). With probability one, the gain of this 
trading strategy is asymptotically not less than the average gain of any stationary trading 
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strategy from one share of the stock: 

(n n \ 

J^MiiSi - S^x) - WDW^Y^D^Si - S^) > 0, 
L — ✓ £ — J I 
i=l t=l / 

where Xj is a side information used by the stationary trading strategy D at step i and 
||-D||oo = sup^, \D(x)\. 

To achieve this goal we extend in Theorem[T](Section|3]) Kakade and Foster's forecasting 
algorithm for a case of arbitrary real valued outcomes and to a more general notion of 



calibration with changing parameterized checking rules. We combine it with Vovk et al. 



(2005) defensive forecasting method in RKHS (cf. Vovk 2005a). In Section [5j using a 



universal kernel, we generalize this result to any continuous stationary trading strategy. We 
show in Section [6] that the universality property fails if we consider discontinuous trading 
strategies. On the other hand, we show in Theorem [9] that a universal trading strategy 
exists for a class of randomized discontinuous trading strategies. 

In Section [7] results of numerical experiments are presented. Our empirical results on his- 
torical markets provide strong evidence that this type of online trading can beat the market: 
our universal strategy is always better than "buy-and-hold" strategy for each stock chosen 
arbitrarily in Stock Market. This strategy outperforms also an online trading strategy using 
some standard prediction algorithm (ARMA). 



2. Preliminaries 

By a kernel function on a set X we mean any function K(x, y) which can be represented 
as a dot product K(x,y) = (3>(x) • &(y)), where $ is a mapping from X to some Hilbert 
feature space. 

The reproducing kernels are of special interest. A Hilbert space T of real- valued func- 
tions on a compact metric space X is called RKHS (Reproducing Kernel Hilbert Space) on 
X if the evaluation functional / — > f(x) is continuous for each x £ X. Let || • \\jr be a norm 
in T and cj[x) = sup |/(x)|. The embedding constant of T is defined: cj = supcj-(x). 

Il/||r<i ' * 

We consider RKHS J- with cjr < oo. 

Let X = [0, l] m for m > 1. An example of RKHS is the Sobolev space T = H 1 ^, 1]), 
which consists of absolutely continuous functions / : [0, 1] — > TZ wit h ||/||j- < 1 , where 



Vovk 



2005a) 



By Riesz— Fisher 



\T = J fo(J{t)) 2 dt + Jl(f'{t)) 2 dt. For this space, c T = vOTiT (cf. 
Let T be an RKHS on X with the dot product (/ • g) for f,g G . 
theorem, for each x G X there exists k x G T such that f(x) = (k x ■ f). 

The reproducing kernel is defined K (x, y) = {k x ■ k y ). The main properties of the kernel: 

k 

1) K(x,y) = K(y,x) for all x,y G X (symmetry property); 2) ^ ctiOtjK(xi,Xj) > for 

»ij=l 

all k, for all Xi £ X, and for all real numbers «j, where i = 1, . . . , k (positive semidefinite 
property). 

Conversely, a kernel defines RKHS: any symmetric, positive semidefinite kernel function 
K(x, y) defines some canonical RKHS J- and a mapping : X — > T such that K(x, y) = 
• Also, cjr{x) = \\k x \\j: = The mapping <J>(x) is also called "feature 

map" (cf. Cristianini and Shawe-Taylor| |2000 , Chapter 3). 
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A function / : X — > TZ is induced by a kernel K(x, y) if there exists an element g G T 
such that fix) = (g ■ $>(x)). This definition is independent of a map $. For any continuous 
kernel K(x,y), every induced function / is continuous (cf. Steinwart (2001)). R In what 



follows we consider continuous kernels. Therefore, all functions from canonical RKHS T 
are continuous. 

For Sobolev space -ff 1 ([0, 1]), the reproducing kernel is 

K(t, t') = (coshmin(i, t') coshmin(l — t,l — t'))/ sinh 1 

(cf. |Vovk||2005a[ ). 



Well known examples of kernels on X = [0, l] m : Gaussian kernel K(x, y) = exp{— ^ ^ }, 
where || • || is the Euclidian norm; K(t,t') = cos(|(t — t')), where m = 1 and t,t' G [0, 1]. 



Other examples and details of the kernel theory see in Scholkopf and Smola ( 2002 ) . 

Some special kernel corresponds to the method of randomization defined below. A 
random variable y is called randomization of a real number y G [0, 1] if E{y) = y, where E 
is the symbol of mathematical expectation with respect to the corresponding to y probability 
distribution. 

We use a specific method of randomization of real numbers from unit interval proposed 



by Kakade and Foster (2004). Given positive integer number K divide the interval [0,1] 
on subintervals of length A = 1/K with rational endpoints v% = iA, where i = 0, 1, . . . , K. 
Let V denotes the set of these points. Any number p € [0,1] can be represented as a linear 
combination of two neighboring endpoints of V defining subinterval containing p : 

P = w v(P) v = w vi-i (p)vi-i + w Vi (p)vi, (2) 

wherep G [vi-i,Vi], i = [p 1 /A+lJ, w^^p) = l-(p-Uj_i)/A, and w Vi (p) = l-(vi-p)/A. 
Define w v (p) = for all other v £ V. Define a random variable 

Vi-\ with probability w Vi _ 1 (p) 
Vi with probability w Vi (p) 

Let w(p) = (w v (p) : v £ V) be a vector of probabilities of rounding. 

For any /c-dimensional vector x = (a?i, . . . , x^) G [0, l] fc , we round each coordinate x s , 
s = 1, . . . k to Vj a -\ with probability w Vjg _ 1 (x s ) and to Vj a with probability w Vjs (x s ), where 
x s G [vj 3 -i,Vj g ]. Let x be the corresponding random vector. 

Let v = (v 1 , v k ) G V k and W v {x) = \[ k s=l w v s(x s ). For any x, let W{x) = {W v {x) : 
v G V k ) be a vector of probability distribution in V k : Yl W v (x) = 1. For x,y G [0, l] k , 

vev k 

the dot product K(x,x') = (W(x) ■ W(x')) is the symmetric positive semidefinite kernel 
function. 



P 



3. Computing the well-calibrated forecasts 

A universal trading strategy, which will be defined in Section [4j is based on the well- 
calibrated forecasts of stock prices. In this section we present a randomized algorithm for 
computing well-calibrated forecasts using a side information. 



It is Lipschitz continuous (with respect to some semimetrics induced by the feature map (Steinwart 
2001] Lemma 3). 
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Basic prediction protocol. 
FOR i = 1, 2 . . . 

Realty announces a signal Xj. 
Predictor announces a forecast pi . 
Realty announces an outcome y% € [0, 1]. 

ENDFOR 

Figure 1: Basic prediction protocol 

A standard way to present any forecasting process is the perfect-information protocol 
(game). The most basic online perfect-information prediction protocol has two players 
Realty and Predictor (see Fig [I]) . In the perfect-information protocol, every player can see 
other players moves so far. 

At the beginning of each step i, Predictor is given some data Xj relevant to predicting 
the following outcome yi. We call Xj a signal or a side information. Signals are taken 
from the object space. Past outcomes and predictions are also known to Predictor in the 
perfect-information protocol. 

The outcomes yi are taken from an outcome space and predictions p% are taken from a 
prediction space. In this paper an outcome is a real number from the unit interval [0, 1] and 
a forecast is a single number from this interval (which can be output of a random variable), 
whereas Kakade and Foster considered a finite outcome space and a probability distribution 
on this space as a forecast. We could interpret the forecast pi as the mean value of a future 
outcome yi under some unknown to us probability distribution in [0, 1]. 

In what follows we compete two types of forecasting algorithms: randomized algorithms 
which we will construct and stationary forecasting strategies which are continuous functions 
D from some RKHS using a side information as input. We consider two predictors D and 
C playing according to the basic prediction protocol (see Fig [l]) . 

At the beginning of each step i Predictor D and Predictor C are given a signal Xj. 
Predictor D uses a stationary prediction strategy D(xj), where D is a function whose input 
is the signal Xj, We suppose that Xj is a real number from the unit interval. The number 
Xj can encode any information. For example, it can be even the future outcome y%. 

Predictor C uses a randomized strategy which we will define below. We collect all 
information used for the internal randomization in a vector Xj. This vector can contain any 
information known before the move of Predictor C at step i: the signal x$, past outcomes 
and so on. 

For example, in Section |4j the information is one- dimensional vector Xi = y%~\ - the 
past outcome, in Section [6j Xi = (yj_i,Xj) is the pair of the past outcome and the signal. 

In general, we suppose that x i is a vector of dimension k > 1: Xi G [0, l] fc . We call it an 
information vector and assume that some method for computing information vectors given 
past outcomes and signals is fixed. 

We use the tests of calibration to measure the discrepancy between predictions and 
outcomes. These tests are based on checking rules. We consider checking rules of more 
general type than that used in the literature on asymptotic calibration. 
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For any measurable subset S C [0, 1] +1 define the checking rule 

t ( -\ _ / i if (p,x) e s, 

S[P,X) ~ \ otherwise, 

where x is an fc-dimensional vector. 

In Section [3] we get k = 1 and <S = {(p, y) : p > y} or S = {(p,y) : p < y}, where 
p, y G [0, 1]. In Section [6j k = 2 and a set 5 is defined in a more complex way. 

In the online prediction protocol defined on Fig [TJ given A > 0, a sequence of forecasts 
pi,P2, ■ ■ ■ is called A- calibrated for sequences of outcomes yi, y2, ■ ■ ■ and information vectors 
x\,X2, ■ ■ ■ if the following asymptotic inequality 



lim sup 

n— >oo 



n 



holds for all measurable subsets S C [0, l] k 
calibrated if 



EIs(Pi,Xi){yi - Pi) < A 

. The sequence of forecasts is called well- 



1 n 

lim - Y] Is(pi,Xi)(yi - pi) = 



(3) 



for all measurable subsets S C [0, l] fc+1 . 

If Realty acts adversatively, then, as shown by Oakes (1985) and Dawid (1985), any 
deterministic forecasting algorithm will not always be calibrated. 

Following the method of Foster and Vohra (1998), at any step i, we will define a deter- 
ministic forecast pi and randomize it to a random variable pi using the sequential method 
of randomization defined in Section [2} We also randomize the information vector x% to a 
random vector icj. 

Let Pr be an overall probability distribution generated by this sequential method of ran- 
domization. We will show that for any measurable subset S C [0, l] k+1 with Pr-probability 
one the equality ([3]) is valid, where pi and Xi are replaced on their randomized variants pi 
and Xi. 

The following theorem on calibration with a side information is the main tool for an 
analysis presented in Sections [4] and [6| 

In the perfect-information prediction protocol defined on Fig [TJ let y\ , y2 , • • • be a se- 
quence of outcomes and xi,X2, . . . be the corresponding sequences of signals given online. 
We assume that a sequence of the information vectors x±, X2, ■ ■ ■ £ TZ k also be defined online. 

Let also, J- be an RKHS on [0, 1] with a kernel R(x, x') and a finite embedding constant 

Theorem 1 For any e > 0, an algorithm for computing forecasts pi,p2, ■ ■ ■ and a sequential 
method of randomization can be constructed such that two conditions hold: 

• (i) for any 5 > 0, with probability at least 1 — 5, for any measurable subset S C [0.1] fe+1 



^2ls(Pi,Xi)(yi-pi) 



< 4e 



k + 1 
2 



2 
fc+3 



+ 



n , 2 
2 ln 5 



(4) 
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for all n, where Is is the characteristic function of S, pi,p2, ■ ■ ■ o,re the corresponding 
randomizations of p\,p2, ■ ■ ■ and x\, X2, ■ ■ ■ are the corresponding randomizations of 
k-dimensional information vectors x\,X2, ■ ■ ■; 



(ii) for any D G T , 



8=1 



< \\D\\ T J(c 2 T + l)n 



(5) 



for all n, where xi,X2, . . . are signals. 



Proof At firs t, in Proposition [2| ( below) , given A > 0, we modify a randomized rounding 



algorithm of Kakade and Foster (2004) to construct some A-calibrated forecasting algo- 



rithm, and combine it with Vovk 



2005a) defensive forecasting algorithm. After that, we 



revise it by applying a variant of doubling trick argument such that Q will hold. 

Proposition 2 Under the assumptions of Theorem^ an algorithm for computing forecasts 
and a method of randomization can be constructed such that the inequality |5p holds for all 
D from RKHS T and for all n. Also, for any 5 > with probability at least 1 — 5, 



n 

E 

i=l 



Is(pi,Xi)(yi - Pi 



>(cUl) /n 2 



holds for all n, where Is is the characteristic function of any measurable subset of S C 
[0,l] fe + 2 . 

Proof. We define a deterministic forecast and after that we randomize it. 

The partition V = {vq, . . . , vk} and probabilities of rounding were defined above by 
In what follows we round some deterministic forecast p n to Vi-\ with probability Wv^^pn) 
and to Vi with probability w Vi (p n ). We also round each coordinate x UiS , s = l,...k, of 
the information vector x n to Vj s -i with probability w Vjs _ l (x ntS ) and to Vj s with probability 
Wv JS (x n ,s), where x n>s G [vj 3 -i,Vj s ]. 

Let W v (p n ,x n ) = w v i(p n )w v 2(x n ), where v = (V,v 2 ) and v 1 eV,v 2 = (vf,...vl) G 
V k , w v 2(x n ) = n s =i w v 2(x n)S ), and W(p n ,x n ) = (W v (p n ,x n ) : v G V k+1 ) be a vector of 
probability distribution in V k+l . Define the corresponding kernel K(p, x,p', x') = (W(p, x) ■ 
W(p',x')). 

Let the deterministic forecasts pi, ■ ■ ■ ,p n -i be already defined (put p\ = 1/2). We want 
to define a deterministic forecast p n . 

The kernel R(x, x') can be represented as a dot product in some feature space: i?(x, x') = 
(<I>(x) • $(x'). Consider 



U n (p) 



n-l 

E 



(K(p, x n ,pi,Xi) + R(Xn,Xi))(yi -pi). 



(6) 



The following lemma presents a general method for computing deterministic forecasts. 
Define .Mo = 1 and 

M n = M n -l + U n (p n )(y n - p n ) 

for all n. 
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Lemma 3 ( Vovk et al. 2005) A sequence of forecasts pi,p%, ■ ■ ■ can be computed such that 
M n < Mn-i for all n. 

Proof. By definition the function U n (p) is continuous in p. The needed forecast is computed 
as follows. If U n (p) > for all p € [0, 1] then define p n = 1; if U n (p) < for all p G [0, 1] 
then define p n = 0. Otherwise, define p n to be a root of the equation U n (p) = (some 
root exists by the intermediate value theorem). Evidently, M n < M n _i for all n. Lemma 
is proved. 

Let forecasts pi,p%, ... be computed by the method of Lemma [3} Then for any N, 

N 

> M N -M = ^2u n {p n )(y n -p n ) = 



77=1 



N n-1 



71=1 7=1 



71=1 7=1 



^ ^2(K(Pn,x n ,Pi,Xi) + R(x n ,Xi))(yi -Pi){y n - Pn) = 

. N N 

^^2^2K(p n ,x n ,pi,Xi)(yi -Pi){y n - Pn) - 

L 

1 N 

~2 ^2(K(Pn,X n ,p n ,X n )(y n -p n )) 2 + 

n=l 
j N N 

+^^2^R{^n,^i)(yi - Pi)(y n - Pn) - 

1 N 

7) ^2{R( x n,*n)(yn ~Pn)) 2 = 



77=1 7=1 



71=1 



iV 



^ W(p n ,x n )(y 

n Pn) 



71=1 



77=1 



1 



A/ 



^ll^(Pn,^|| 2 (y„-p n ) 2 + 

1 - 

-Y,W®M\\ 2 Ayn - Pnf ■ 



(7) 

(8) 

(9) 



+ " ^2$(Xn){yn-Pn) 

71=1 71=1 

In Q, || • || is Euclidian norm, and in Q, || • ||jr is the norm in RKHS T . 
Since (y n — p n ) 2 < 1 for all n and 

||(W(p n ,X n )|| 2 = (W v ( Pn ,X n )) 2 < W r w (p re ,Xn) = l, 

„ e yfc+i 7,ey>=+i 

the subtracted sum of ^ is upper bounded by N. 

Since ||$(x n )||jr = cjr(x n ) and Cjr(x) < cj- for all x, the subtracted sum of ^ is upper 
bounded by CjrN. As a result we obtain 

N 



77=1 



^Ty(p n ,x n )(y 

77 Pi 

1 

TV 

77 Pn) 



77 = 1 



<j(4 + i)iv 



(10) 

(ii) 



9 



for all N. Let us denote \i n = Yl W(pi,Xi)(yi ~Pi)- By (10), ||/U„|| < y(cjr + l)n for all n. 

i=l 

Let fi n = (fi n ( v ) '■ v G F fc+1 ). By definition for any i; 



A*n(v) = } J W v {pi,x i )(yi - pi). 



(12) 



i=l 



Insert the term I(v) in the sum (12), where I is the characteristic function of an arbitrary 
set S C [0, l] fc+1 , sum by v G V k+1 , and exchange the order of summation. Using Cauchy- 
Schwarz inequality for vectors I = (I(v) : v 6 l/ fe+1 ), /2 n = (fi n (v) : u G l/ fc+1 ) and Euclidian 
norm, we obtain 



^2 ^2 W v (pi,Xi)I(v)(yi-pi 



w v{PhXi)(yi -pi 



(/•An) < ||/|| • \\fin\\ < \ \V k+1 \{c% +1) 



(13) 



for all n, where |U fc+1 | = 1/A k+1 - is the cardinality of the partition. 

Let pi be a random variable taking values v G V with probabilities w v (pi) (only two 
of them are nonzero). Recall that random variable taking values v G V k with 

probabilities w v (xj). 

Let 5 C [0, l] k+1 and / be its indicator function. For any i, the mathematical expectation 
of a random variable I(pi, Xi)(y« — p\) is equal to 



E(I(pi,Xi)(yi-pi)) = ^2 W v (pi,Xi)I{v){yi-v l ), 



(14) 



where v = {v l ,v 2 ). By Azuma-Hoeffding inequality (see (26) below), for any 5 > 0, with 
Pr-probability 1 — 5, 



^2l(pi,Xi)(yi-pi) - ^E{I{pi,Xi){yi - pi)) 

8=1 



i=l 



>, 2 
< a/ -In-. 
2 5 



(15) 



By definition of the deterministic forecast 



^ W 1 ,(pi,Xi)/(«)(yi -Pi) - ^2 W v (pi,Xi)I(v)(yi- v l ] 



< A 



for all i, where v = (w 1 ,^ 2 ). Summing (14) by i = 1, . . . ,n and using the inequality (13), 
we obtain 



^2E(I(pi,Xi)(yi - pi)) 



i=l 
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Yl W v {pi,Xi)I{v){yi-v l \ 

8=1 v ^yk + i 



< 



< An + J{c% + l)n//\ k + 1 



(16) 



for all n. 



By (15) and (16), with Pr-probability 1 — 5, 



^2l(pi,Xi)(yi - pi 



i=i 



< An + A /(4 + l)n/A fc +! + ^/^ln|. 



(17) 



By Cauchi-Schwarz inequality 

N 

n Pn) 



n=l 



N 



N 



\n=l 



< 



^{Vn ~ Pn){D ■ §{x n )) 
n=l 

^2(Vn -Pn)${x n ) 



n=l 



D\\r< 



< ||L>||W(4 + l)iV. 



Proposition is proved. 

Now we turn to the pro of of Theorem [T) 



The expression An + w (c^- + l)n/A fc+1 from (16) and (17) takes its minimal value for 

£■4-1 2 n 1 1 I 

A = (^rp 1 ) k + 3 {cjr + l)wn fe +s. In this case, the right-hand side of the inequality (16) is 
equal to 



An + \ n(c 2 jr + 1)/A fe+1 < 2An = 2 



k + 1 \ fe +3 



(cjr + l) fe +3n 1 fc 



+ 3 . 



(18) 



In what follows we use the upper bound 2 An in (16). 

To prove the bound Q choose a monotonic sequence of rational numbers Ai > A2 > . . . 
such that A s — > as s — > 00. We also define an increasing sequence of positive integer 
numbers n\ < n% < . . . For any s, we use for randomization on steps n s < n < n s+ i the 
partition of [0, 1] on subintervals of length A s . 

We start our sequences from n\ = 1 and Ai = 1. Also, define the numbers n2,n^, . . . 
such that the inequality 



^2E(I(pi,Xi)(yi - pi)) 



i=l 



< 4(s + l)A s n 



(19) 



holds for all n s < n < n s+ \ and for all s > 1. 

We define this sequence by mathematical induction on s. Suppose that n s (s > 1) is 
defined such that the inequality 



^2E(I(pi,Xi){yi - pi)) 



i=i 



< 4sA s _in 



(20) 
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holds for all n s _i <n<n s , and the inequality 



*Y^E(I{pi,Xi)(yi-pi)) 



i=l 



< 4sA*n q 



(21) 



also holds. 

Let us define n s +i. Consider all forecasts pi defined by the algorithm given above for 
the discretization A = A s +i. We do not use first n s of these forecasts (more correctly we 
will use them only in bounds (22) and (23); denote these forecasts pi,. .. ,p n J- We add 
the forecasts pi for i > n s to the forecasts defined before this step of induction (for n s ). Let 
n s+ i be such that the inequality 

n s +i 



^2 E(I{pi,Xi)(yi - pi)) 



+ 



i=l 



< 



^2E{I(pi,Xi)(yi - pi)) 



»=i 



E(I(pi,Xi)(yi -pi)) + J^E (I (pu Xi)(yi - pi)) 



i=n s +l 



+ 



i=l 



+ 



+ 



^2E(I(pi,Xi)(yi - pi)) 



i=i 



<4(s + l)A s+ in 



s+l 



(22) 



holds. Here the first sum of the right-hand side of the inequality (22 ) is bounded by 4sA s n s 
- by the induction hypothesis (21). The second and third sums are bounded by 2A s _|_in s+ i 
and by 2A s+ in s , respectively, where A = A s+ i is defined such that (18) holds. This follows 
from (16) and by choice of n s . 



The induction hypothesis (21) is valid for 

2sA s + A s+1 



Similarly, 



n s+ i > 



A s+1 (2s + l) 



-n. 



J2E(I(pi,Xi)(yi - pi)) 



i=i 



< 



J2E(I(pi,Xi)(yi - pi)) 



»=i 



+ 



E(I(pi,Xi){yi -pi)) + Y E (I (p h Xi)(yi - p ; )) 



-n s +l 



1=1 



+ 



+ 



+ 



^2E(I(pi,Xi)(yi - pi)) 



i=i 



< 4(s + l)A s n 



(23) 



for n s < n < n s+ i. Here the first sum of the right-hand inequality (22) is also bounded by 
4sA s n s < 4sA s n - by the induction hypothesis (21). The second and the third sums are 
bounded by 2A s+ in < 2A s n and by 2A s+ in s < 2A s n, respectively. This follows from (16) 
and from choice of A s . The induction hypothesis (20) is valid. 
By (19) for any s 



^2E(I(pi,Xi)(yi - pi)) 



i=l 



< 4(s + l)A s n 



(24) 
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for all n > n s if A s satisfies the condition A s+ i < A s (l — for all s. 

We show now that sequences n s and A s satisfying all the conditions above exist. 

Let e > and M = [1/e], where [~r] is the least integer number such that m > r. Define 

2 1 L_ 

n s = (s + M) M and A s = (^p) fc+3 {c 2 F + l)*+5n s k+3 . Easy to verify that all requirements 
for n s and A s given above are satisfied, where e is sufficiently small. 



We have in (24) for all n s < n < n s+ \ 



4(s + l)A s n < 4(s + M)A s n s+1 



k + l\ k +3 



{cjr + 1)*T3 (s + M)(s + M + 1) M + M)~ ^ < 



< 4e 



k + 1 \ fc + 3 



(cjr+ 1)H3„ S 



< 



< 4e 



+ 1 \ fc+3 



(cjr+l) fe + 3 n 1 fc +3" 



where e is the base of the natural logarithm. Therefore, we obtain 



^E{I{pi,Xi){yi - pi)) 



< 4e 



k + l\ k + 3 



(cjr + l) fe + 3 n 



for all n. Azuma-Hoeffding inequality says that for any 7 > 



Pr 



1 n 



i=l 



> 7 \ < 2e~ 2n ^ 



for all n, where are martingale-differences. 

We get Vi = I(pi,Xi)(yi - pi) - E(I(pi, Xi){yi - pi)) and 7 



(25) 



(26) 



s-lnl, where <5 > 0. 



Denote i/(n) = 4e (^) fc + 3 (c^ + 1 



_!_ 1 

fe+3 ft fc+ 3 



Combining ( |25[ ) with (26), we obtain that for any (5 > 0, with probability 1 — 6, 

n 

^2l(pi,Xi)(yi-pi) 



i=i 



n 2 

< i/(n) + \/ - In - 



for all n. Theorem [T] is proved. 



4. Competing with stationary trading strategies from RKHS 

A trading game has two players: Trader and Stock Market. They correspond to Predictor 
and Realty in the simple prediction game defined in Section [3| 

We suppose that the prices Si , S2 , ■ ■ ■ of a stock are bounded and rescaled such that 
< Si < 1 for all t. We get also Sq = 0. These prices are analogs of outcomes of the 
prediction game. 

We present the process of online trading in Stock Market in the form of a trading game 
regulated by the perfect-information protocol presented on Fig [2] 



13 



Basic trading protocol. 
FOR i = 1,2... 

Stock Market announces a signal Xj G X. 

Trader bets by buying or selling a number Cj of shares of the stock by each. 
Stock Market reveals a price Si of the stock. 

Trader receives his total gain (or suffers loss) at the end of step i : 
Jd = Ki-! + Ci{S t - S^). We get Kq = 0. 

ENDFOR 

Figure 2: Basic trading protocol 

At the beginning of each step i Trader is given an object Xj G X which was called a side 
information at step i. Without loss of generality suppose that X = [0, 1]. 

We call the sequence Cj a trading strategy. In case Cj > Trader playing for a rise, in 
case Ci < Trader playing for a fall, Trader passes the step if Ci = 0. We suppose that 
Trader can borrow money for buying d shares and can incur debt. 

A stationary trading strategy is a function D from X to 1Z. We suppose that some RKHS 
T on X with a kernel i?(x, x') and with a finite embedding constant cj- is given. 

Any stationary trading strategy D uses at step i a side information - a real number 

xj e x. 

Our universal trading strategy will be randomized. By a randomized trading strategy 
we mean a sequence Mi, i = 1, 2, . . ., of the random variables. 

The universal trading strategy which we define below uses the past price SV-i of the 
stock as one-dimensional information vector in sense of Theorem [TJ where So = 0. This 
information is used for the internal randomization. 

We define a universal trading strategy as a random variable Mj and show that this 
trading strategy performs almost surely at least as good as any stationary trading strategy 
D £ T using arbitrary side information Xj. 

To be more concise, define on Fig[3]the perfect-information protocol of the game with two 
traders: Trader Muses the randomized strategy Mj, Trader D uses an arbitrary stationary 
trading strategy D £ T . 

This protocol is more general than two basic trading protocols (Fig [2]) together, since 
Stock Market can use information on the decisions of both traders M and D before revealing 
a future price Si. 

At first, for simplicity, we consider a case of dealing for a rise, since the proof of opti- 
mality (Theorem [4]) is much more clear in this case than that in general case (Theorem [5]). 
Also, a series of numerical experiments presented in Section [7j are performed for the case 
where both traders dealing for a rise. The case of dealing for a fall is considered similarly. 

At each step i we will compute a forecast pi of a future price and randomize it to pi. 
We also randomize the past price Sj-i of the stock to S^-i. Details of this computation 
and randomization are given in Section [3} Our universal strategy is a randomized decision 
rule - it takes only two values: 

M l = { 1 if > 
* 1 otherwise. 
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Trading protocol with two traders. 
FOR i = 1,2... 

Stock Market announces a signal Xj. 

Trader Mbets by buying or selling the random number Mi of shares of the stock by Si_i 
each. 

Trader D bets by buying or selling a number -D(xj) of shares of the stock by Si-\ each. 
Stock Market reveals a price Si of the stock. 

Trader M receives his total gain (or suffers loss) at the end of step i : 
Kf = Kf_ x + Mi(Si - Si-x). We get /C* 1 = 0. 

Trader D receives his total gain (or suffers loss) at the end of step i : 

= K?-x + D(xi)(Si - Si-!). We get Kg = 0. 
ENDFOR 



Figure 3: Trading protocol with two traders 



Assume that prices Si,S2,... G [0,1] and signals Xi,X2, ... G [0,1] be given online 
according to the protocol presented on Fig|3j Denote ASi = S% — S^-i- We use the norm 

||-D||oo = sup |-D(x)|, 
xe[o,i] 

where D is a nonnegative continuous function. If a function D is not identically zero then 
Halloo > 0. We call such a function nontrivial. 

Informally, our main result says that if the forecasts pi are well-calibrated on the se- 
quence of prices Si, i = 1, 2, . . ., then Trader M performs at least as good as any Trader D 
playing for a rise. 

Theorem 4 An algorithm for computing forecasts pi and a sequential method of randomiza- 
tion can be constructed such that for any nontrivial nonnegative stationary trading strategy 



/ n n \ 

liminf T^ASi - -\\D\\-J Vl^A^ > (27) 
\ n n I 

\ i=i i=i / 



holds almost surely with respect to a probability distribution generated by the corresponding 
sequential randomization. 

Moreover, for any e > this trading strategy M 1 can be tuned such that for any 5 > 0, 
with probability at least 1 — 5, for all nontrivial nonnegative D € T 

n n 

J2MlASi>\\D\\^J2D(xi)ASi- 

i=l i=l 

-|(7e - !)(<& + l)snt +e - ||D||- 1 [|D[|^(4 + l)n - 



n 2 

2 ln i (28 » 



/or aZZ n ; where e is the base of the natural logarithm. 
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Proof. We use the randomized trading strategy M 1 based on the well-calibrated forecasts 
defined in Section [3| where yi = Si and Xi = Si-\. 

Recall that e > and M = fl/e]. At any step i we compute the deterministic forecast pi 
defined in Section [3] and its randomization to pi using parameters A = A s = (cj- + l) 3 (s + 
M) _ t and n s = (s + M) M , where n s < i < n s+ i. Let also, be a randomization 
of the past price Si—i- The following upper bound directly follows from the method of 
discretization: 



y~] 1 (pi > Si-i)(Si-x - s,, 



i=l 



t=o 



n t )A t < 



4 , „ , o „ 1 5+e 4. . . n ,13 

< g(e - !)(<& + l) 3 nr < g(e - !)(<£ + 



(29) 



Let -D(x) be an arbitrary nontrivial nonnegative trading strategy from RKHS J- . Clearly, 
the bound (29) holds if we replace I(pi > Sj-i) on ||Z)|| -1 Z)(xj). 



Let M 1 be the randomized trading strategy defined above. We use abbreviations: 

1 4 3 

v x (n) = ( c 2. + i)4( e - 1) -n4 +e , 



/n 2 

z/ 2 (rc) = 4en3 +e (cj- + 1)* + \j - In 



1/3 (n) = \J(c% + l)n 



(30) 

(31) 
(32) 



All sums below are for z = 1, . . . n. By definition < D(x{) < ||-D||oo for all Xj G [0, 1]. 

Let 5 > 0. Then, with probability 1 — 5, for any D 6 J 7 , the following chain of equalities 
and inequalities is valid: 



X)M/(5 i -5 i _i)= £ (Si-Si-i) = 

i=1 Pi>&-1 

2 Y (Pi-Si-i)+ Y &-i-Si-i)> 

§i>Si-i pi>Si-i pi>Si-i 

- Y (Pi - Si-l) - U ~t( n ) - U 2( n ) ^ 
Pj>Si_i 



> \\D\\^ l J2D&)(Pi-Si-i)-vi(n)-v 2 (n) = 

i=l 

n n 

\d\\^ y D (*i)(Pi - s i-i) + \\ D \\™ Y - «) - 

i=l i=l 
n 

-PH" 1 E - St-i) - ui(n) - v 2 (n) > 

i=l 
n 

> \\D\\^Y D (xi)(Pi ~ Si-i) ~ 3^(n) - i/ 2 (n) > 



(33) 
(34) 

(35) 



(36) 
(37) 
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8=1 



i=l 

-4^i (n) - z/ 2 (n) - \\D\\^ \\D\\^u 3 (n) = 

n 

= \\D\\^Y,D(x t )(S l -S i „ l )- 
i=l 

-4i/i(n) - ^(n) - \\D\\^\\D\\ T u 3 (n). 



(38) 



(39) 



In transition from (33) t o (|34[ ) the inequality Q of Theorem [T] and the bound (29) were 
used, and so, the terms (|30[) and (31) were subtracted. The transition from ( [34 ) to ( 
is valid since < D(x) < ||-D||oo f° r all x. In transition fro m (p6| ) to (37) the bound 
was applied twice to intermediate terms, and so, the term (|29|) was subtracted twice. In 
transition from (37) to (38) the inequality (15]) of Theorem IT] was used, and so, the term 



(32) was subtracted. In transition from (|38[) to (39) we have used the inequality §5§ of 
Theorem [T] Therefore, we have (28). 

The inequality (27) follows from (28). Theorem [4] is proved. 

Now, we consider the general case of dealing for a rise and for a fall. The corresponding 
trading strategy is defined: 

1 if pi > Si-i, 
-1 if pi < 



Mi 



Trader D is also dealing for a rise and for a fall. 

Let Si, S2, ... E [0, 1] and xi,X2, ... E [0, 1] be given online according to the protocol 
presented on Fig [3] 



Theorem 5 An algorithm for computing forecasts pi and a sequential method of random- 
ization can be constructed such that for any nontrivial stationary trading strategy D E T , 



/ n n \ 

liminf - VM,A5 4 - -\\D\\£ V D(xi)ASi > 
n->oo \ n n L — ' / 

\ i=l i=l / 



(40) 



holds almost surely with respect to a probability distribution generated by the corresponding 
sequential randomization. 

Moreover, for any e > this trading strategy M can be tuned such that for any 5 > 0, 
with probability at least 1 — 5, for all nontrivial D E T 

n n 

J^MiASi > \\D\\^^2D(xi)ASi- 



3 



i=i i=i 
(5e - 2) (4 + l)3nt +e - \\D\\^\\D\\ T J {c^ + l)n 



(41) 



for all n. 
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Proof. We use abbreviations (30)- (32) from the proof of Theorem [4j 
Define 

' D(x) if D(x) > 0, 
otherwise. 



D+(x) 



and 



D~(x) 



D(x) if £>(x) < 0, 
otherwise. 



By definition D(x) = £> + (x) + D~(x). 

The proof of Theorem [5] is based on transformations similar to (33)-(39) 
Let S > 0. Then with probability 1 — 5 for any D G J 7 : 





E M *o% 
i=i 


— St-l) 




= E (Si-Si_i) 


- E & 


— Si-i) 




Pi>5i_i 


Pi<§i-i 






/ J (A — Pi) + 2^ ^ ~~ + 




— iJi-l) 




Pi>Si-i pi>Si-i 








> (Si — jDi) — > fjDi — S", 1 1 ) — 


V (St i 


" t — 1 J 


> 


fi<Si-\ pi<Si-i 








> E 


- ^i(n) 


- V2{n) 


— 










- E (ft- 


i-l) - ^i(ra) 


— v-zip) 


> 


Pi<5i_i 








^ ii 7-|ii-i \ w,^ c 

> Halloo 2^ ^ \*i)\Pi ~ L 


>Vi) - ^i(ra) 


- vzin) 


+ 


Pi>5i_i 








+PH- 1 E 0-(*)(fc-£ 


i-i) - v\{n) 


- vi{n) 












n 

^pii^E^wfe-i 
i=i 


ii— l) - ^i(n) 


- »2 (n) 


+ 


n 

+pn- i ^;i?-(x i )(ft-£ 

i=l 


i-i) - ^i(n) 






n 

= \\D\\- 1 J2D(xi)(Pi-Si- 

i=l 


i)-2n(n)- 


- 2i/ 2 (n) 




n 

= \\D\\^Y J D(x l )(p l -S t - 1 ) + \\D\\ 


re 




i=l 


i=l 






re 

-pii^E^)^-!-^- 

i=l 


l)-2z/!(n)- 


- 2z/ 2 (n) 


> 
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> \\D\\^ D( Xi )( Pl - Si-i) - 4i*(n) - 2v 2 {n) > 



i=l 



> \\D\\^Y,D&)(Si - Si-i) - WDW-^Dtem-pi) 
1=1 1=1 

-4i/i(n) - 2^ 2 (n) - ||I>||- 1 ||I>||.Fi*(fO 



|Z)|L 1 ^D(x i )(5 i -5 l _ 1 ) 



i=l 



-4^(n) - 2i/ 2 (n) - \\D\\^\\D\\ T u 3 (n). 



The proof of these transitions is similar to the proof of transitions in ( 33 )-( 39 ) of Theorem[4j 
This completes the proof of Theorem [5j 

Theorem [H] can be rewritten for the strategy M\ = IM{ and for the class of stationary 
strategies D G T with bounded norm ||-D||oo < ^> where I is an arbitrary positive integer 
number. 

We present the following statement for M\. 

Corollary 6 An algorithm for computing forecasts pi and a sequential method of random- 
ization can be constructed such that given complexity bound I > for any nontrivial sta- 
tionary trading strategy D G T such that ||-D||oo < I 



lim inf 

n— >oo 



/ n n \ 

[-J2M l i AS i --Y / D(^)AS i >0 
V i=i i=i / 



holds almost surely with respect to a probability distribution generated by the corresponding 
sequential randomization. 

For any e > 0, this trading strategy M\ can be tuned such that for any S > 0, with 
probability at least 1 — 5, for all nonnegative D G T such that ||-D||oo < I an d f or a ^ n 

n n 
i=l i=l 



n . 2 



--{be- 2)i(<& + l)in4 +£ - \\D\\f^(c% + l)n - 2lyj 'j In § . 
5. Universal consistency 

Using a universal kernel and the corresponding canonical universal RKHS, we can extend 
our asymptotic results for all continuous stationary trading strategies. 

An RKHS J- on X is universal if X is a compact metric space and every continuous 
function / on X can be arbitrarily well approximated in the metric || • ||oo by a function 
from T: for any e > there exists D G T such that 

sup|/(s)-D(x)| <e 

xex 



(cf. Steinwart 2001 Definition 4). 
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We use X = [0, 1]. The Sobolev space T = ii^QO, 1]) defined in Sectionals the universal 
RKHS (cf. |Steinwartl[200l| |Vovk||2005a| ). 

We call a randomized trading strategy Mj universally consistent if for any continuous 
function / with probability one 



/ n n 

liminf - Yl ~ " -11/11- E 

\ i=l i=i 



)(5i-5i_i) >0. 



(46) 



This definition is similar to Vovk (2005a) definition of a universally consistent prediction 
strategy. 

The existence of the universal RKHS on [0, 1] implies the following 

Theorem 7 An algorithm for computing forecasts pi and a sequential method of random- 
ization can be constructed which performs at least as good as any nontrivial continuous 
trading strategy f: 



/ n n 

liminf -VM^ - -ll/IL 1 V/(x, 
rwoo \ n — ' n ^-^ 

\ i=l i=l 



iA& > 



(47) 



holds almost surely with respect to a probability distribution generated by the corresponding 
sequential randomization. 

This result directly follows from the possibility to approximate arbitrarily close any con- 
tinuous function / on [0, 1] by a function D from the universal RKHS T: for any non- 
trivial continuous function / and for any < e < 1 take a nontrivial D G T such that 
||/- J D|| 0O <ie||/|| 0O . Then 

/ n 1 n \ 

liminf - VMiAS, - -II/H" 1 V/CxOASi + e > 
V i=l i=l / 

/ n n \ 

-VM i A5 i --|| J D||- 1 VDCxiJAS* >0. 



> liminf 

n— >oo 



(48) 



Since (48) holds for each e > 0, (47) is valid. 

The property of universal consistency is asymptotic and does not tell us anything about 
finite data sequences: we cannot obtain the convergence bounds like (28) and (41) which 
holds for stationary strategies from RKHS. 



6. Competing with discontinuous trading strategies 

The trading strategy Mj defined in Section [4] performs at least as good as any stationary 
trading strategy -D(x) (up to some regret) even if the future price Si of the stock is known 
to D as a side information contained in Xj. Theorems [4] and [5] are also valid in this case. 

This impressive efficiency of the trading strategy Mj can be explained by the restrictive 
power of continuous functions. A lack of Trader D is that a set of his strategies is limited 
by T . A continuous stationary trading strategy D cannot respond sufficiently quickly to 



20 



information about changes of the value of a future price Si. the optimal trading strategy 
Mi, is a discontinuous function, though it is applied to the random variables. 

A positive argument in favor of the requirement of continuity of D is that it is natural 
to compete only with computable trading strategies, and continuity is often regarded as a 
necessary condition for computability (Brouwer's "continuity principle"). 



If D is allowed to be discontinuous, we cannot prove (27) and (40) in general case. 

Let Mi be an arbitrary randomizing trading strategy. For simplicity, we assume that 
Mi, M2, ... is a sequence of i.i.d. random variables, n 

A stationary trading strategy -D(x) is called decision rule if its range is finite. Decision 
rule is binary if it takes only two values. 

Theorem 8 Let Mi be an arbitrary i.i.d sequence of random variables (randomized trading 
strategy) such that \Mi\ < 1 for all i. 

Consider the protocol of trading game with two players and with signals Xj = P{M{ > 0} 
for all i presented on Fig [3| 

Then a binary decision rule -D(x) and a sequence Si, S2, ■ ■ ■ of prices can be defined such 
that with probability one 



/ n 11" \ 

limsup -VM i A5,---Vi)(x i )AS i < 0, 



(49) 



where ASi = Si — Si-i- Inequality (49) means that trading strategy D outperforms Mi twice. 



Proof. Let Xj = P{Mi > 0}. We bound the mathematical expectation of the random 
variable Mf. 

E(Mi)= J MidP+ j MidP < P{Mi > 0} < x*. (50) 

Mi>0 Mi<0 

E{Mi) > -P{Mi < 0} = Xi - 1. (51) 



Define a sequence of stock prices: So = 1/2 and for 1 < i < 1 

Si = 



5i _ 1 _ 2 -( i +i) if x< > \ 
Si^ + 2"( i+1 ) otherwise. 



By definition Si > for all i. 
Define the decision rule D: 



-1 if Xj > \ 
1 otherwise. 



If Xi > i then E(Mi) > -\ by (51), ASi = -2~( i+1 ), and Z)(xi) = -1 by definition. 



2. The internal randomizations is performed independently. Theorem [S] can be generalized to arbitrary 
sequence M\,M%, . . . not necessary i.i.d. 
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If Xj < \ then E(Mi) < | by (J50J), AS 1 ,; = 2~^ i+x \ and D(xj) = 1 by definition. We 
have 

(n \ n 

^MiAsA =Y J E(M i )AS i = 
i=i / i=i 

n n 1 n 1 

= E(Mi)ASi + E(Mi)ASi < - £ = - (52) 



2 

X >- X ■<- ^— 1 

1 2 1 — 2 



for all n. 



n n n n _. 

£ DCxOASi = D(xj)AS 4 + Y ^( X 0A5, = ]T 2-< i+1 ) = -. (53) 

^—1 X >- X <- i=l 

1 2 z — 2 

By the martingale law of large numbers (Azuma-Hoeffding inequality) with probability one 

1 - 

- V(M 4 - E(Mi)) 

n ^ — ^ 



n 

8=1 



as n — )• oo. From this (49) follows. Theorem is proved. 

The discontinuous trading strategy D defined in Theorem [8] is unstable under small 
changes of the signal Xj. In the next theorem, we show that if we randomly round the 
signal Xj then our universal trading strategy M (and M/), performs at least as good as D. 

Consider the protocol of trading game with two players and a side information Xj e [0, 1] 
(see Fig §. 

We specify the information vector using by our universal strategy Mj to be Xi = 
(Sj_i,Xj), where 5j_i is the past price of the stock and Xj is the signal at step i. The uni- 
versal trading strategy Mj uses the sequential method of randomization defined in Section [2] 
to perform a randomized forecast pi and a randomized information vector Xi = (Sj_i,Xj). 



The strategy of Trader M is the same as before: 

Mi = 



1 if pi > Si-x, 
— 1 otherwise, 

except that it uses a slightly different randomization. 

Theorem 9 An algorithm for computing forecasts and a sequential method of randomiza- 
tion can be constructed such that for any nontrivial decision rule D 

/ n n \ 

liminf - MASi - -ll-DII" 1 V D(Sti)ASi > (54) 
rwoo \ n * — ' n z — ' / 

\ i=l i=l / 

holds almost surely with respect to a probability distribution generated by the corresponding 
sequential randomization. 
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Moreover, for any e > this trading strategy Mi can be tuned such that for any 5 > 0, 
with probability at least 1 — 5, for all nontrivial nonnegative decision rule D € T 

n n 

MiASi > \\D\\^ £ D(Zi)ASi - 
i=i i=i 



-5(m + l)ens +e - (m + l)(e - l)-n3 +e - (m + lW - In — (55) 

3 V 2 o 

for all n, where m is the cardinality of the range of D. 

Proof. For simplicity, we give the proof for the case of nonnegative decision rule and the 
randomized strategy . The case of arbitrary decision rule D and strategy Mi is considered 
similarly. 

We apply Theorem [I] to zero kernel R(x, x') = with cjr = and to the information 
vector Xi = (£j_i,Xj), k = 2. 

Recall that e > and M = |~l/e] . At any step i we compute the deterministic forecast 
Pi defined in Theorem [T] (Section [3]) and its randomization to pi using parameters A = A s = 
(s + M)~^ and n s = (s + M) M , where n s < i < n s +i. 

The following upper bound is valid: 



i=l 



< y~](n t+ i - n t )A t < 



t=o 



< ^(e-l)n| +e < ^(e-l)nf +e . (56) 

Let -D(x) be an arbitrary nontrivial nonnegative decision rule. Let Ml be the random- 
ized trading strategy defined in Section |4| We use abbreviations: 

u x {n) = (e-l)^nf +£ , (57) 

2 , 

/3\ 5 4,, /n , 2m 
^(n)=4e(-J nl+^ + ^-ln— . (58) 

All sums below are for i = 1, . . . n. By definition < D(5ti) < ||-D||oo for all Xj 6 [0, 1]. 
Let di, . . . , d m be all values of D. Define 

Sj = {(p,y,x) : < p,y < l,.D(x) = 

where j = 1, . . . , m; let J5. be the characteristic function of the set Sj. 

Let <5 > 0. Then, with probability 1 — 5, the following chain of equalities and inequalities 
is valid: 

n 

X)M/(5 i -5 i _i)= £ 0% -<%-i) = 

i=1 Pi>&-1 

= ( s i -Pi) + S 2 - Si-i) > (59) 

Pi>5i_i pi>§i-i pi>Si-i 
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Figure 4: Evolution of capitals of three trading strategies for the period 26.03.10-25.03.11: 
Buy and Hold - solid line, UN dealing for a rise - dotted line, UN dealing for a 
fall - dashed line. One run of trading is performed with a simulated stock TEST 
(see Table [l]) 



- XT {Pi ~ Si-i) ~ v i{n) - v 2 {n) > 
Pi>§i-i 



> \\D\\^^2D(ii){pi - Si-!) - vi(n) - u 2 {n) = 

i=! 

n n 

= \\D\\^ £ DfrKPi - Si) + PH" 1 2 D(*i)(Si-i - S^) + 

i=l t=l 
n 

+ \\D\\^ 1 Y,D(Sz t )(S l - S^!) - Vl {n) - v 2 (n) > 
i=l 

n 

> \\D\\^ D(xi){Si - Si-!) - (1 + m>i(n) - (1 + m)v 2 (n). 



(60) 
(61) 



(62) 
(63) 



i=l 



In change from (59) to (60) and in change from (62) to (p3| we have used the inequality 



(56). In change from (62) to (63) we have used also Theorem [TJ where k = 2, and, with 
probability 1—5, 



J2D&i)(Si-Pi) 



i=! 



< 



m n 



j^dj^m^iSi-Pi) 

j=l t=l 



< m||-D||ooi^(n). 



The inequality (54) follows from (55). Theorem [9] is proved. 

Two symmetric solid linesgains of two zero sums strategies, dotted lineexpected gain of 
the algorithm PROT (without transaction costs), dashed linevolume of the game 
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Figure 5: Evolution of capitals of three trading strategies for the period 26.03.10-25.03.11: 
Buy and Hold - solid line, UN dealing for a rise - dotted line, UN dealing for a 
fall - dashed line. One run of trading is performed with the stock KOCO (see 
also Table [l]) 



7. Numerical experiments 

Computer technology. In the numerical experiments, we have used historical data in 
the form of per minute time series of prices of arbitrarily chosen stocks. 

Two types of kernel functions were used as the smooth approximations of the combined 
kernel K(p,x n ,pi,Xi) + -R(x n ,x;) from the sum Q: (i) IC(p,p n ) = cos((-7r(p - p„)/2), (ii) 
lC(p,p n , x, Xi) = exp(c(p — p n ) + d (x — X{)), where c, d are positive constants. 

In any short-term trading algorithm, the time characteristics are crucial. The greatest 
time cost is associated with the calculation of sums ^ and finding the roots of this equation. 
The performed experiments show that the computation time for one point of the forecast 
increases linearly with increasing length of history. To provide one point of time, predicting 
within 1-3 seconds of CPU time, the length of the series was limited up to 5000 points. 
For series of length greater than 5000 points, "a chain" method of forecasting was used. 
Two processes working on overlapping intervals of time series are performed at the same 
time (see Fig [6]). 

Let -L max be the chain length, and L s hift be the value of time shift, where L s hift < -^max- 
In any process, the first L s hift time-points are used only for scaling prices and preliminary 
learning of the forecasting algorithm. The trading is not performed at first ^ s hift time-points 
of the series. 

When a regular process terminates we switch to the time-point -L s hift + 1 of the next 
process. The results of parallel computing are accumulated into a single overall forecasting 
series. We get -L max = 5000 and L s hift = 2000. 

The prices of a stock are scaled such that Si £ [0, 1] for all i. The scaling is performed 
for time series of each process separately. The first L s hift time points of any process are 
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Figure 6: Scheme of parallel computations 



used for computing a scaling constant. Prices are scaled as follows: 



■ max Sj 

l<J<ishift 



where 1 < i < L max and Si are real prices of the stock. We get c = 14. 

The forecasting algorithm is performed for the scaled prices Si, where L s hift + 1 < i < 

We implement this computer technology for two forecasting algorithms: the universal 
strategy constructed in Se ction [3] (UN-mod el) and Autoregressive Moving Average algo- 
rithm (ARMA-model) (cf. |Peng and Aston|[20lT| . f\ 



Results of numerical experiments. In the numerical experiments, we have used 
historical data in form of per minute time series of prices of arbitrarily chosen 17 stocks 
(11 US stocks, and 6 Russian stocks) and of one simulated stock TEST. Data has been 
downloaded from FINAM site: |www . f inam . ru[ Number of trading points in each game is 
N=88000-116000 min. (From March 26 2010 to March 25 2011). 

The artificial stock TEST is simulated as Si = Sj-i + i = 1, 2, . . . , N, where £j is the 
Gaussian random variable with mean and a variance equal to the variance of the scaled 
GAZP stock. 

We implement the trading strategy defined in Section [4} 
Two series of numerical experiments were performed. 

In the first series, we use the trading strategy M, studied in Theorem [ij At each step, 
starting from initial capital ICq = JCq = /Co = KSo, where So is the price of a stock at the 
first time point, this strategy performs dealing for a rise or for a fall with K shares of the 
stock. We take K = 5 in our experiments. In case of dealing for a rise, the capital changes 
at any step i as ICf- = + K{Si — S{-i) if pi > §i-\ and K,f = otherwise. In case 
of dealing for a fall Kf = ICf_ 1 — K(Si — SV-i) if Pi < Si-i and fCf = 1C[_ 1 otherwise, where 
i = 1,2,...,N. 

Results of numerical experiments are shown in Table [TJ In the first column, stocks 
ticker symbols are shown. The second column contains the profit of Buy-and-Hold trading 
strategy. By this strategy, we buy a holding of shares using capital /Co an d sell them for 
/Cat at the end of the trading period. 

3. See also the State Space Models Toolbox for MATLAB: 
\http://sourceforge.net/projects/ssmodels^ 
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Table 1: Universal trading 





BUY& 


UN 


UN 


ARMA 


ARMA 


Ticker 


Hold 


FOR A RISE 


FOR A FALL 


FOR A RISE 


FOR A FALL 




Profit % 


Profit % 


Profit % 


Profit % 


Profit % 


TEST 


6.85 


-1.39 


-8.19 


9.88 


3.08 


AT-T 


7.71 


145.21 


137.51 


53.74 


46.038 


CTGR 


15.04 


1711.47 


1696.78 


1534.72 


1620.03 


KOCO 


16.55 


69.84 


53.32 


31.35 


14.83 


GOOG 


10.25 


115.80 


105.57 


43.01 


32.78 


InBM 


24.28 


83.59 


59.30 


53.45 


29.16 


INTL 


4.29 


118.91 


114.71 


58.57 


54.37 


MSD 


10.71 


56.40 


45.69 


33.90 


23.19 


US1.AMT 


22.01 


28.37 


6.40 


28.97 


7.00 


US1.IP 


2.40 


30.12 


27.75 


-7.88 


-10.24 


US2.BRCM 


25.30 


62.53 


37.19 


49.20 


23.86 


US2.FSLR 


40.15 


159.11 


118.81 


13.73 


-26.56 


SIBN 


-6.54 


747.80 


754.27 


448.12 


444.89 


GAZP 


22.75 


100.01 


77.26 


37.89 


15.14 


LKOH 


19.39 


269.07 


249.67 


136.2 9 


116.89 


MTSI 


-1.61 


698.08 


699.61 


434.84 


436.37 


ROSN 


9.69 


197.03 


187.26 


93.99 


84.22 


SBER 


14.21 


112.05 


97.98 


32.62 


18.55 



In the 3th and 4th columns, results of one run of trading based on the universal ran- 
domized forecasting strategy (UN) are shown. In the 3th column, a relative return, per- 
centagewise, to the initial capital K ' N £^"° 100% is shown for dealing for a rise, in the 4th 
column, the same relative return is shown for dealing for a fall, In the 5th and 6th columns, 
the same results are shown for trading using ARMA forecasts. 

It was found that /Q > for i = 1, 2, . . . , N, i.e., we never incur debt in our experiments 
(with an exception of TEST stock) . 

Results presented in Table [2] show that trading based on UN model of forecasting per- 
forms at least as good as the trading based on ARMA forecasting model and essentially 
outperforms it for some stocks. 

The second series of experiments is closer to a real short-term trading. The trading 
strategy has a defence guarantee. Starting with the same initial capital /Co = KSo, where 
So is the initial price of a stock and K = 5, we perform dealing for a rise using "a defensive" 
trading strategy. At any step i, our working capital is A-i = min{/Co, fCi-i}. Using this 
capital, we buy Mj = Ci-i/Si-x shares of the stock at the beginning of any step i, if 
Ci-i > 0, and stop trading otherwise: Mj = 0. We update the cumulative capital at the 
end of each step: K.. L = + Mj(5j — Si_i). Thereby, we can set aside the extra income. 

Results of second series of numerical experiments are shown in Table [2] In the first 
column, stocks ticker symbols are shown. The second column contains the relative return of 
Buy-and-Hold trading strategy. In the next pair of columns marked "UN" , relative returns 
of one run of randomized trading, percentagewise, for the initial capital are presented for 
the case with no transaction costs and for the case where transaction cost at the rate 0.01% 
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Table 2: Defensive trading 





BUY& 


UN 


UN 


ARMA 


ARMA 


UN 


ARMA 


UN 


ARMA 


Ticker 


HOLD 


Profit 


Profit 


Profit 


Profit 












% 


% 


-0.01% 


% 


-0.01% 


In 


In 


D 


D 


TEST 


6.85 


3.58 


-80.93 


3.58 


-80.90 


0.232 


0.163 


1.453 


1.890 


AT-T 


7.71 


69.01 


-79.19 


29.86 


-79.19 


0.218 


0.205 


1.611 


1.576 


CTGR 


15.04 


1030.12 


658.13 


937.46 


540.18 


0.238 


0.253 


1.654 


1.479 


KOCO 


16.55 


36.47 


-78.62 


15.69 


-78.55 


0.216 


0.198 


1.604 


1.502 


GOOG 


10.25 


46.54 


-80.57 


3.53 


-82.68 


0.231 


0.211 


1.462 


1.474 


InBM 


24.28 


54.79 


-78.53 


34.66 


-78.10 


0.219 


0.187 


1.514 


1.517 


INTL 


4.29 


43.06 


-76.60 


5.63 


-76.28 


0.220 


0.179 


1.630 


1.585 


MCD 


10.71 


34.22 


-78.56 


19.21 


-78.41 


0.222 


0.190 


1.571 


1.876 


AMT 


22.01 


16.47 


-77.01 


24.04 


-77.09 


0.212 


0.183 


1.654 


1.758 


IP 


2.40 


4.45 


-82.78 


-14.79 


-81.06 


0.213 


0.181 


1.657 


1.760 


BRCM 


25.30 


11.40 


-80.47 


23.98 


-76.10 


0.216 


0.172 


1.585 


1.876 


FLSR 


40.15 


21.02 


-80.04 


-27.50 


-80.03 


0.227 


0.196 


1.499 


1.506 


SIBN 


-6.54 


600.62 


249.87 


287.48 


-58.55 


0.169 


0.179 


2.460 


2.292 


GAZP 


22.75 


51.29 


-82.04 


4.34 


-82.16 


0.224 


0.210 


1.539 


1.526 


LKOH 


19.39 


149.03 


-79.91 


46.44 


-80.62 


0.230 


0.244 


1.527 


1.501 


MTSI 


-1.61 


482.83 


79.23 


275.13 


-69.36 


0.188 


0.195 


2.174 


1.959 


ROSN 


9.69 


101.15 


-83.14 


-0.53 


-83.54 


0.228 


0.240 


1.549 


1.499 


SBER 


14.21 


51.56 


-82.52 


-14.47 


-82.73 


0.225 


0.196 


1.559 


1.674 



is subtracted. We compute the forecast of a future stock price by the method of calibration 
and defensive forecasting (UN) presented in Theorem [T] 

The next two columns marked by "ARMA" are similar, with the exception that the 
ARMA forecasting model is used for computing forecasts. The frequencies of market entry 
steps i, where p, > Si-i, are given in the next two columns marked "In" (for UN and 
ARMA). We sell all shares of a stock at step i in case pi < Sj-i- The average time spent in 
the market is shown in the rest two columns marked "D" (for UN and ARMA). 

8. Conclusion 

Asymptotic calibration is an area of intensive research where several algorithms for com- 
puting well-calibrated forecasts have been developed. Several applications of well-calibrated 
forecasting have been proposed (convergence to correlated equilibrium, recovering unknown 
functional dependencies, predictions with expert advice). We present a new application of 
the calibration method. 

We show that the universal trading strategy can be constructed using the well-calibrated 
forecasts. We prove that this strategy performs at least as good as any stationary trading 
strategy presented by a rule from any RKHS with regret O(raJ). Using the universal kernel, 
we prove that this strategy performs at least as good as any stationary continuous trading 
strategy. 
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The obvious drawback of a universal strategy is that it uses the high frequency trading, 
which prevents it from practical applications in the presence of transaction costs. 

To construct the universal trading strategy, we generalize Kakade and Foster's algorithm 
and combine it with Vovk's DF-model for arbitrary RKHS. Using Vovk (2006) theory of 
defensive forecasting in Banach spaces, these results can be generalized to these spaces. 

Unlike the statistical theory, no stochastic assumptions are made about the stock prices. 

Numerical experiments show a positive return for all chosen stocks, and for some of 
them we receive a positive return even when transaction costs are subtracted. Results of 
this type can be useful for technical analysis in finance. 
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